Maciej Zasada's profile

Closer - Apple Vision Pro AI voice communicator

TL;DR
We have developed an Apple Vision Pro voice communicator app that is tailor-made for the spatial user interface of visionOS and the specific conditions under which social apps on Apple Vision Pro are utilized - at home, in comfort, and when time permits.

The outcome is an experience in which received audio is analyzed by AI and then visualized as 360-degree skyboxes, immersing the user while listening to the message. This truly brings the sender and receiver closer together.
Our passion for experimental user interfaces and our drive to discover new and improved ways of utilizing technology compelled us to envision how a personal voice communicator could function and appear on Apple Vision Pro.
Research
We initiated the project by analyzing the current functionality of voice messaging. The primary applications that support voice messaging features include WhatsApp, Messenger, and Instagram.

However, due to the vastly different nature of the circumstances in which a mobile communicator is used compared to the Apple Vision Pro headset, we had to think completely outside the box.

The headset is typically operated at home and when there is sufficient time available for its use. Starting with a blank slate, we held a single goal in mind: utilizing the spatial interface and a comfortable at-home setting to conceive and develop a genuinely intimate, personal voice communicator that would forge a closer connection between two individuals.
Concept
This is when the concept of Closer was conceived—a spatial communicator designed for intimacy, aiming to bring people closer even when they are physically apart.

Our ambition was to employ Artificial Intelligence (AI) to visualize the semantic meaning of the received audio message and immerse the listener in the intended thoughts of the sender during the recording of the note.
Dashboard view
We aimed to provide the user with a simple yet intuitive interface to navigate the app. The dashboard view utilizes a side tab bar for switching between the inbox and the recording mode. The UI is streamlined to a minimum, avoiding unnecessary obstruction of the user's view.
Recording a message
Capitalizing on the advantages of incorporating a third spatial dimension, we crafted an intuitive and straightforward recording interface. An audio orb responds to the user's microphone input, providing visual confirmation that the sound is being captured.
Viewing a message
Our aspiration was to forge an intensely personal and intimate connection between the sender and the receiver. Here's how we brought this vision to life:

Received messages, while fundamentally recorded audio, undergo analysis by AI and are transcribed into text. Subsequently, the AI generates a spherical image that encapsulates the message's content.

This visual representation is then transformed into a three-dimensional bubble, enabling the recipient to gain an immediate 'sense' of the message's essence.
visionOS Immersion mode
When a message is played back, we immerse the user in an AI-generated representation of the message's content. For instance, here Agnes is sharing a dream she had, vividly describing her experience of walking through a bustling street in India:
and here is Immersion mode for another message—this time from Mike—where he notifies his intention to visit for a dinner with his wife:
Attention to detail
While implementing Closer, we dedicated significant attention to micro-animations, enriching the user experience.
Spatial app icon
We devised a spatial icon in accordance with the visionOS guidelines, causing it to emerge when the user gazes upon it.

The icon's visuals encompass several concepts. The white shapes can be interpreted as the letters 'C', aligning with the app's name, 'Closer.' They are also mirrored, symbolizing two users situated on opposite sides of the app. When combined, the shape at the intersection of the 'C's resembles an eye. Lastly, the 'C' shapes signify the immersion mode, encircling the user in a span of 180 or even 360 degrees.
Technology
We developed the application using Xcode Beta, Swift, and SwiftUI.
We designed and animated 3D components, such as the bubble representation of the message, using custom shaders and shader graphs in Reality Composer Pro.
On the backend, we established an auto-scaling infrastructure on the Google Cloud Platform, incorporating a Google Cloud Run service to host our backend API. Whenever a voice message was sent, it was routed through this backend to undergo transcription into text using Google Cloud Speech to Text. Subsequently, we utilized OpenAI's ChatGPT API to analyze the message's semantic content and distill key concepts articulated by the user. We then tasked ChatGPT with transforming these concepts into descriptive scenes. Lastly, we fed these descriptions into Blockade Labs' AI-powered Skybox generator to produce immersive 360-degree scenes.
Get in touch to discuss the possibility of launching your brand on Apple Vision Pro today:

Closer - Apple Vision Pro AI voice communicator
Published:

Project Made For

Closer - Apple Vision Pro AI voice communicator

Closer is an AI-powered spatial voice communicator for Apple Vision Pro. It visualizes received voice messages by analyzing their contents with C Read More

Published: